relevant aspect
Post-hoc LLM-Supported Debugging of Distributed Processes
Schiese, Dennis, Both, Andreas
In this paper, we address the problem of manual debugging, which nowadays remains resource-intensive and in some parts archaic. This problem is especially evident in increasingly complex and distributed software systems. Therefore, our objective of this work is to introduce an approach that can possibly be applied to any system, at both the macro- and micro-level, to ease this debugging process. This approach utilizes a system's process data, in conjunction with generative AI, to generate natural-language explanations. These explanations are generated from the actual process data, interface information, and documentation to guide the developers more efficiently to understand the behavior and possible errors of a process and its sub-processes. Here, we present a demonstrator that employs this approach on a component-based Java system. However, our approach is language-agnostic. Ideally, the generated explanations will provide a good understanding of the process, even if developers are not familiar with all the details of the considered system. Our demonstrator is provided as an open-source web application that is freely accessible to all users.
Coverage Metrics for a Scenario Database for the Scenario-Based Assessment of Automated Driving Systems
de Gelder, Erwin, Buermann, Maren, Camp, Olaf Op den
Automated Driving Systems (ADSs) have the potential to make mobility services available and safe for all. A multi-pillar Safety Assessment Framework (SAF) has been proposed for the type-approval process of ADSs. The SAF requires that the test scenarios for the ADS adequately covers the Operational Design Domain (ODD) of the ADS. A common method for generating test scenarios involves basing them on scenarios identified and characterized from driving data. This work addresses two questions when collecting scenarios from driving data. First, do the collected scenarios cover all relevant aspects of the ADS' ODD? Second, do the collected scenarios cover all relevant aspects that are in the driving data, such that no potentially important situations are missed? This work proposes coverage metrics that provide a quantitative answer to these questions. The proposed coverage metrics are illustrated by means of an experiment in which over 200000 scenarios from 10 different scenario categories are collected from the HighD data set. The experiment demonstrates that a coverage of 100 % can be achieved under certain conditions, and it also identifies which data and scenarios could be added to enhance the coverage outcomes in case a 100 % coverage has not been achieved. Whereas this work presents metrics for the quantification of the coverage of driving data and the identified scenarios, this paper concludes with future research directions, including the quantification of the completeness of driving data and the identified scenarios.
- North America > United States (0.28)
- Europe > Netherlands (0.04)
- Automobiles & Trucks (1.00)
- Government > Regional Government (0.93)
- Transportation > Ground > Road (0.85)
- Information Technology > Robotics & Automation (0.71)
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
Recently, natural language generation (NLG) evaluation has shifted from a single-aspect to a multi-aspect paradigm, allowing for a more accurate assessment. Large language models (LLMs) achieve superior performance on various NLG evaluation tasks. However, current work often employs the LLM to independently evaluate different aspects, which largely ignores the rich correlation between various aspects. To fill this research gap, in this work, we propose an NLG evaluation metric called CoAScore. Powered by LLMs, the CoAScore utilizes multi-aspect knowledge through a CoA (\textbf{C}hain-\textbf{o}f-\textbf{A}spects) prompting framework when assessing the quality of a certain aspect. Specifically, for a given aspect to evaluate, we first prompt the LLM to generate a chain of aspects that are relevant to the target aspect and could be useful for the evaluation. We then collect evaluation scores for each generated aspect, and finally, leverage the knowledge of these aspects to improve the evaluation of the target aspect. We evaluate CoAScore across five NLG evaluation tasks (e.g., summarization, dialog response generation, etc) and nine aspects (e.g., overall quality, relevance, coherence, etc). Our experimental findings highlight that, in comparison to individual aspect evaluation, CoAScore exhibits a higher correlation with human judgments. This improvement significantly outperforms existing unsupervised evaluation metrics, whether for assessing overall quality or other aspects. We also conducted extensive ablation studies to validate the effectiveness of the three stages within the CoAScore framework and conducted case studies to show how the LLM performs in these stages. Our code and scripts are available.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Iceland (0.04)
- Asia > Japan (0.04)
- (4 more...)
- Health & Medicine (0.47)
- Leisure & Entertainment (0.46)
Interview with Safa Alver: Scalable and robust planning in lifelong reinforcement learning
In their paper Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning, Safa Alver and Doina Precup introduced special kinds of models that allow for performing scalable and robust planning in lifelong reinforcement learning scenarios. In this interview, Safa Alver tells us more about this work. It has long been argued that in order for reinforcement learning (RL) agents to perform well in lifelong RL (LRL) scenarios (which are scenarios like the ones we, biological agents, encounter in real life), they should be able to learn a model of their environment, which allows for advanced computational abilities such as counterfactual reasoning and fast re-planning. Even though this is a widely accepted view in the community, the question of what kinds of models would be better suited for performing LRL still remains unanswered. As LRL scenarios involve large environments with lots of irrelevant aspects and environments with unexpected distribution shifts, directly applying the ideas developed in the classical model-based RL literature to these scenarios is likely to lead to catastrophic results in building scalable and robust lifelong learning agents.
Establishing Meta-Decision-Making for AI: An Ontology of Relevance, Representation and Reasoning
Badea, Cosmin, Gilpin, Leilani
Making good decisions is a very important part of constructing One way to deal with or preempt failure in such a good Artificial Intelligence (AI). However, there is system is to use preferences and rule-based decisionmaking an important distinction between decision-making itself and (Dietrich and List 2013). For example, in the field reasoning about decision-making, similarly to the distinction of moral reasoning, there is value-based decision-making between (normative) ethics and metaethics. We believe with a rule-based implementation (Badea 2020). The focus more focus in the areas of automated decision-making, anticipatory of such works is generally on the preference ordering on the thinking and cognitive systems ought to be explicitly values (the Representation step we discuss below), or on the given to discussing and deciding upon the characteristics ordering on the rules (the Reasoning step below). We will of good decision-making systems and how best to build use this implementation from (Badea 2020) as a running example them.
- North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)